action cost
IsBang-BangControlAllYouNeed? SolvingContinuousControlwithBernoulliPolicies
Real-world robotics tasks commonly manifest ascontrol problems overcontinuous action spaces. When learning to act in such settings, control policies are typically represented as continuous probability distributions that cover all feasible control inputs - often Gaussians. The underlying assumption is that this enables more refined decisions compared to crude policy choices such as discretized controllers, which limit the search space but induce abrupt changes. While switching controls canbeundesirable inpractice astheymaychallenge stability andaccelerate system weardown, they are theoretically feasible and even arise as optimal strategies in some settings.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization
Griesbach, Sebastian, D'Eramo, Carlo
Numerous heuristics and advanced approaches have been proposed for exploration in different settings for deep reinforcement learning. Noise-based exploration generally fares well with dense-shaped rewards and bonus-based exploration with sparse rewards. However, these methods usually require additional tuning to deal with undesirable reward settings by adjusting hyperparameters and noise distributions. Rewards that actively discourage exploration, i.e., with an action cost and no other dense signal to follow, can pose a major challenge. We propose a novel exploration method, Stable Error-seeking Exploration (SEE), that is robust across dense, sparse, and exploration-adverse reward settings. To this endeavor, we revisit the idea of maximizing the TD-error as a separate objective. Our method introduces three design choices to mitigate instability caused by far-off-policy learning, the conflict of interest of maximizing the cumulative TD-error in an episodic setting, and the non-stationary nature of TD-errors. SEE can be combined with off-policy algorithms without modifying the optimization pipeline of the original objective. In our experimental analysis, we show that a Soft-Actor Critic agent with the addition of SEE performs robustly across three diverse reward settings in a variety of tasks without hyperparameter adjustments.
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > Austria > Vienna (0.14)
- (12 more...)
Planning with Minimal Disruption
Pozanco, Alberto, Morales, Marianela, Borrajo, Daniel, Veloso, Manuela
In many planning applications, we might be interested in finding plans that minimally modify the initial state to achieve the goals. We refer to this concept as plan disruption. In this paper, we formally introduce it, and define various planning-based compilations that aim to jointly optimize both the sum of action costs and plan disruption. Experimental results in different benchmarks show that the reformulated task can be effectively solved in practice to generate plans that balance both objectives.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- (3 more...)
Interleaved LLM and Motion Planning for Generalized Multi-Object Collection in Large Scene Graphs
Yang, Ruochu, Zhou, Yu, Zhang, Fumin, Hou, Mengxue
Household robots have been a longstanding research topic, but they still lack human-like intelligence, particularly in manipulating open-set objects and navigating large environments efficiently and accurately. To push this boundary, we consider a generalized multi-object collection problem in large scene graphs, where the robot needs to pick up and place multiple objects across multiple locations in a long mission of multiple human commands. This problem is extremely challenging since it requires long-horizon planning in a vast action-state space under high uncertainties. To this end, we propose a novel interleaved LLM and motion planning algorithm Inter-LLM. By designing a multimodal action cost similarity function, our algorithm can both reflect the history and look into the future to optimize plans, striking a good balance of quality and efficiency. Simulation experiments demonstrate that compared with latest works, our algorithm improves the overall mission performance by 30% in terms of fulfilling human commands, maximizing mission success rates, and minimizing mission costs.
- North America > United States (0.14)
- Asia > China > Hong Kong (0.04)
Risk Awareness in HTN Planning
Alnazer, Ebaa, Georgievski, Ilche, Aiello, Marco
Actual real-world domains are characterised by uncertain situations in which acting and using resources may entail the embracing of risks. Performing actions in such domains involves costs of consuming some resource, such as time or energy, where the knowledge about these costs can range from known to totally unknown. In autonomous vehicles, actions have uncertain costs due to factors like traffic. Choosing an action requires assessing delay risks, as each road may have unpredictable congestion. Thus, these domains call for not only planning under uncertainty but also planning while embracing risk. Resorting to HTN planning as a widely used planning technique in real-world applications, one can observe that existing approaches assume risk neutrality, relying on single-valued action costs without considering risk. Here, we enhance HTN planning with risk awareness by considering expected utility theory. We introduce a general framework for HTN planning that allows modelling risk and uncertainty using a probability distribution of action costs upon which we define risk-aware HTN planning as being capable of accounting for the different risk attitudes and allowing the computation of plans that go beyond risk neutrality. We lay out that computing risk-aware plans requires finding plans with the highest expected utility. We argue that it is possible for HTN planning agents to solve specialised risk-aware HTN planning problems by adapting existing HTN planning approaches, and develop an approach that surpasses the expressiveness of current approaches by allowing these agents to compute plans tailored to a particular risk attitude. An empirical evaluation of two case studies highlights the feasibility and expressiveness of this approach. We also highlight open issues, such as applying the proposal beyond HTN planning, covering both modelling and plan generation.
- Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
- Europe > Italy (0.04)
- North America > United States (0.04)
- (4 more...)
- Research Report (1.00)
- Workflow (0.92)
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
- Leisure & Entertainment (0.67)
Cost-Aware Query Policies in Active Learning for Efficient Autonomous Robotic Exploration
Akins, Sapphira, Mertens, Hans, Zhu, Frances
In missions constrained by finite resources, efficient data collection is critical. Informative path planning, driven by automated decision-making, optimizes exploration by reducing the costs associated with accurate characterization of a target in an environment. Previous implementations of active learning did not consider the action cost for regression problems or only considered the action cost for classification problems. This paper analyzes an AL algorithm for Gaussian Process regression while incorporating action cost. The algorithm's performance is compared on various regression problems to include terrain mapping on diverse simulated surfaces along metrics of root mean square error, samples and distance until convergence, and model variance upon convergence. The cost-dependent acquisition policy doesn't organically optimize information gain over distance. Instead, the traditional uncertainty metric with a distance constraint best minimizes root-mean-square error over trajectory distance. This studys impact is to provide insight into incorporating action cost with AL methods to optimize exploration under realistic mission constraints.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France (0.04)
Decision-Focused Learning to Predict Action Costs for Planning
Mandi, Jayanta, Foschini, Marco, Holler, Daniel, Thiebaux, Sylvie, Hoffmann, Jorg, Guns, Tias
In many automated planning applications, action costs can be hard to specify. An example is the time needed to travel through a certain road segment, which depends on many factors, such as the current weather conditions. A natural way to address this issue is to learn to predict these parameters based on input features (e.g., weather forecasts) and use the predicted action costs in automated planning afterward. Decision-Focused Learning (DFL) has been successful in learning to predict the parameters of combinatorial optimization problems in a way that optimizes solution quality rather than prediction quality. This approach yields better results than treating prediction and optimization as separate tasks. In this paper, we investigate for the first time the challenges of implementing DFL for automated planning in order to learn to predict the action costs. There are two main challenges to overcome: (1) planning systems are called during gradient descent learning, to solve planning problems with negative action costs, which are not supported in planning. We propose novel methods for gradient computation to avoid this issue. (2) DFL requires repeated planner calls during training, which can limit the scalability of the method. We experiment with different methods approximating the optimal plan as well as an easy-to-implement caching mechanism to speed up the learning process. As the first work that addresses DFL for automated planning, we demonstrate that the proposed gradient computation consistently yields significantly better plans than predictions aimed at minimizing prediction error; and that caching can temper the computation requirements.